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Abstract . This  is  an  expository  paper  presenting  various  ways  of  transforming 


dependent  models  into  independent  ones  and  displaying  applications  in  a variety 
of  contexts  including  reliability  modelling,  life  testing,  and  nonparametric 
estimation  in  the  study  of  competing  risks. 

1.  Introduction. 

The  central  theme  of  this  survey  is  the  transformation  of  dependent  models 
into  independent  ones.  By  "dependent  (independent)  models"  we  mean  multivariate 
stochastic  models  whose  joint  probability  distribution  are  distributions  of 
dependent  (independent)  random  variables.  Each  of  the  transformations  discussed 
here  can  be  used  to  convert  the  original  dependent  model  into  an  independent 
model  which  is  equivalent  (in  a specified  sense)  to  the  original  model.  It  is 
the  purpose  of  this  paper  to  present: 

(a)  some  key  theorems  upon  which  such  transformations  are  based,  and 

(b)  a variety  of  applications  in  reliability  and  biometry. 

We  do  not  give  formal  proofs  of  the  results  presented;  these  may  be  found 
in  the  original  papers  cited.  Rather,  we  motivate  the  key  ideas  by  examining 
important  special  cases  and  several  illustrative  examples. 

2.  Distributions  with  exponential  and  proportional  hazard  minima. 

In  this  section  and  the  next  we  describe  methods  for  converting  dependent 
models  into  independent  ones  based  upon  the  assumption  that  the  joint  distri- 
bution of  the  random  variables  in  the  original  (dependent)  model  belongs  to  a 
specified  family  of  distributions. 

We  begin  with  some  terminology  and  notation.  A life  length  T is  a 
nonnegative  random  variable  such  that  lim^^  P(T  > t)  - 0.  Suppose  that  a 

system  consists  of  n components  with  random  life  lengths  T, T . We 

X n 

say  that  the  system  is  a series  (parallel)  system  If  the  failure  of  the  system 


coincides  with  the  earliest  (latest)  component  failure.  Thus,  the  life  length 
of  the  corresponding  series  (parallel)  system  is  given  by  mind^,  1 £ i £ n) 
[maxd^,  1 £ i < n)  ] . Series  and  parallel  systems  are  examples  of  more  general 
systems  in  reliability  known  as  coherent  systems  [see  Birnbaum,  Esary,  and 
Saunders  (1961)  or  Barlow  and  Proschan  (1975)]. 

A random  vector  (T^, . . . ,T^)  has  exponential  minima  if  mind^,  i e I)  is 

exponentially  distributed  for  every  nonempty  subset  I of  {1 n}.  In 

reliability  terms,  a random  vector  has  exponential  minima  if  the  life  length 
of  every  series  subsystem  (i.e.,  every  series  system  which  may  be  formed  by 
using  a subset  of  the  n components)  is  exponentially  distributed.  In 
particular,  individual  components  (series  systems  of  size  one)  have  exponential 
life  lengths.  The  (n-dimensional)  random  vectors  T and  U are  marginally 
equivalent  in  minima  (T  “ U,  in  symbols)  if  mind^,  i € I)  and  min(U^,  i e I) 
have  the  same  distribution  for  each  nonempty  I c {l,...,n}. 

A particularly  important  multivariate  distribution  with  exponential  minima 
is  the  multivariate  exponential  (MVE)  of  Marshall  and  Olkin  (1967).  The  classic 
paper  of  Marshall  and  Olkin  (1967)  and  the  model  derived  therein  have  prompted 
numerous  investigations  [see  the  annotated  bibliography  of  Kotz  (1974)].  The 
following  characterization  of  the  MVE  is  a particularly  useful  one. 

Theorem  2.1.  A random  vector  (U ,U  ) has  the  (n-dimensional)  MVE 

i n 

distribution  if  and  only  if  there  exists  a collection  (Hj,  J e J},  UJ  - {l,...,n 

of  independent  exponential  random  variables  such  that  ■ min(Hj,  J c J,  i c J) 

1 * l,.««,n. 

Theorem  2.1  is  an  immediate  extension  of  Theorem  3.2  of  Marshall  and 

Olkin  (1967).  To  simplify  notation  we  shall  adopt  the  following  conventions 

throughout  the  remainder  of  this  paper.  Let  I denote  the  collection 

of  all  nonempty  subsets  of  {1,  ...,  n} . Whenever  an  element  {i, 1 } 

1 m 


of  I appears  as  a subscript,  as,  for  example,  in  H,  . we 

shall  often  write  instead  H.  . . We  say  that  two  random 

H •** 

st 

variables  X and  Y are  stochastically  equal  and  write  X ■ Y,  if  X and  Y 
have  the  same  probability  distribution.  Unless  otherwise  indicated,  all  random 
vectors  are  assumed  to  be  n-dimensional . 

Esary  and  Marshall  (1974)  prove  the  following: 

Theorem  2.2.  Suppose  that  a random  vector  £ has  exponential  minima.  Then 
there  exists  a random  vector  U with  the  MVE  distribution  of  Marshall  and 
Olkin  such  that  ]T  - £. 

Theorem  2.2  can  be  used  to  obtain  a consistent  estimator  for  system 
reliability  as  the  following  example  illustrates: 

Example  2.3.  Suppose  that  an  estimate  of  system  reliability  for  an  arbitrary 
coherent  system  of  n components  is  desired  prior  to  manufacture  of  the  system. 
Suppose  also  that  the  only  available  failure  data,  however,  is  for  n-component 
parallel  systems  whose  component  life  lengths  have  the  same  joint  distribution 
as  those  of  the  given  system.  If  component  life  lengths  have  exponential 
minima,  then  by  Theorem  2.2  there  is  a random  vector  U with  the  MVE  distri- 
bution such  that  T - IJ.  Consistent  estimators  for  the  parameters  of  the  MVE, 
given  the  failure  data  from  parallel  systems  as  above,  have  been  obtained  by 
Proschan  and  Sullo  (1976) . Since  the  reliability  of  the  system  can  be  expressed 
as  a continuous  function  of  survival  probabilities  Pfmind^,  i e I)  > t],  I e I 
[see  Esary  and  Marshall  (1970)],  we  can  replace  P[min(Tit  i e I)  > t]  by  an 
estimator  for  P[min(Uit  i e I)  > t]  given  by  Proschan  and  Sullo  (1976)  and 
thus  obtain  a consistent  estimator  for  system  reliability. 

In  view  of  Theorem  2.1,  we  can  state  Theorem  2.2  in  the  following  equivalent 


4 


Theorem  2.3.  Suppose  that  a random  vector  T has  exponential  minima.  Then 
there  exists  a collection  {HJt  J e J},  UJ  ■ {l,...,n},  of  independent 
exponential  random  variables  such  that  for  each  I e I, 

min(T^,  i e I)  min(Hj,  J e J,  J n I * 0) . 


Consider  for  example,  a series  system  whose  component  life  lengths 
T^,...,Tn  have  exponential  minima  and  are  mutually  dependent.  If  we  view  the 
independent  random  variables  (H^}  of  Theorem  2.3  as  the  component  life  lengths 
in  a new  series  system,  then,  in  effect,  Theorem  2.3  allows  us  to  transform  a 
dependent  model  into  an  independent  one,  while  preserving  the  life  distribution 
of  the  original  system.  Under  different  conditions  we  shall  see  in  Section  4 
how  to  transform  a dependent  model  into  an  independent  one,  while  preserving 
not  only  system  life  length,  but  also  the  probabilities  of  occurrence  of  certain 
"failure  patterns".  We  shall  also  see  that  such  transformations  from  dependent 
to  independent  models  are  not  only  of  interest  in  their  own  right,  but  also 
have  important  statistical  applications  to  the  theory  of  competing  risks  and 
to  statistical  life  testing,  in  general. 

Esary  and  Marshall  (1974)  establish  existence  only  in  their  proof  of 
Theorem  2.3  by  using  the  special  nature  of  coherent  systems  in  reliability 
theory.  The  proof  of  Theorem  2.3  given  by  Langberg,  Proschan,  and  Quinzi 
(1977a)  is  considerably  more  elementary  and  specifies  explicitly  the  distri- 
butions of  the  independent  random  variables  {H^}  as  follows.  Suppose  that 
P[min(Ti,  i € I)  > t]  ■ exp(-yjt),  I e I,  for  some  collection  {p^,  I c I)  of 
positive  constants.  Then  the  random  variable  in  Theorem  2.3  is  exponentially 

distributed  with  parameter  Xj  given  by 


(-1) 


#(J)-1 


<wl...n“E  Mnr  + , ? 


i i } 
i, , i0eJ  lll,l2f 


+ (-l)#(J)yy),  (2.1) 


**'.-*» 
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where  // ( J ) is  the  cardinality  of  J and  J is  the  complement  of  J in 
{l,...,n}.  Formula  (2.1)  provides  an  explicit  solution  for  the  parameters 
{Aj}  in  terms  of  the  known  constants  (pj).  Formula  (2.1)  also  indicates 
ways  of  testing  the  validity  of  the  assumption  of  exponential  minima  as  the 
following  examples  illustrate. 

First,  suppose  it  is  known  a priori  that,  due  to  the  structure  of  a 
particular  system,  it  is  impossible  for  the  components  in  some  subset  J 
(generally,  some  collection  of  subsets)  to  fail  simultaneously. 

This  is  equivalent  to  assuming  that  the  corresponding  parameter  A^  = 0.  If 
the  corresponding  linear  combination  of  (known)  constants  { y ^ } given  by  (2.1) 
does  not  yield  A^  - 0,  then  the  assumption  of  exponential  minima  must  be 
wrong.  Similarly,  if  some  combination  of  the  {pj}  yields  a A^  which  is 
negative,  then  the  assumption  of  exponential  minima  is  likewise  incorrect. 

More  generally,  formula  (2.1)  indicates  a heuristic  method  for  testing 
the  statistical  hypothesis  of  exponential  minima.  For  example,  consider  a 
two-component  system  with  component  life  lengths  T^  and  T^.  If  P[min(T^,  i c I)  > t] 
■ exp(-uIt),  I e 7,  then  by  (2.1), 

X1  " v12  ~ w2  * 0 
a2  - u12  - * 0 


X12  " U1  + v2  * v12  * °* 


Consequently,  we  would  expect  that  estimates  for  the  u^'s,  together  with  an 
allowance  for  random  error,  would  satisfy  a similar  set  of  inequalities.  If 
not,  we  would  tend  to  doubt  the  hypothesis  of  exponential  minima. 

Employing  the  same  technique  of  proof  used  to  prove  Theorem  2.3,  Langberg, 
Proschan,  and  Quinzi  (1977a)  [hereafter  referred  to  as  LPQ  (1977a))  obtain  a 
generalization  of  Theorem  2.3.  First  we  introduce  some  terminology.  The 
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hazard  function  R associated  with  the  distribution  function  F of  a nonnegative 
random  variable  is  the  function  R(t)  - -log[l  - F(t)],  t s 0.  The  (multivariate 
distribution  of  the)  nonnegative  random  vector  T_  has  proportional  hazard 
minima  if  there  exists  a collection  {p^.,  I c I)  of  positive  constants  such 
that  P[min(Tit  i e I)  > t]  ■ expt-lijRCt)  ] , I e I,  where  R(*)  is  a hazard 
function,  i.e.,  a nonnegative,  nondecreasing  function  satisfying  R(0)  = 0 
and  R(®)  ■ «.  LPQ  (1977a)  prove  the  following: 

Theorem  2.4.  Suppose  that  a random  vector  T_  has  proportional  hazard  minima 
with  hazard  function  R.  Suppose  further  that  R is  continuous  at  tQ  = 
sup{t:  R(t)  » 0).  Then  there  exists  a collection  {H  , J £ J>,  u J » {l,...,n}, 

of  independent  random  variables  with  hazard  functions  proportional  to  R(») 
such  that  for  each  I e I, 

min(T  , i c I)  S-  min (H. , J £ J,  J n I * 0) . 

Furthermore,  the  constants  of  proportionality  {X  , J £ J)  are  given  by  (2.1). 

Remark  2.5.  Note  that  in  the  special  case  R(t)  - t,  t s 0,  the  conclusion 
of  Theorem  2.4  holds  by  Theorem  2.3. 

The  assumption  of  proportional  hazard  minima  holds,  for  example,  when  the 
random  vector  T has  minima  with  the  Weibull  distribution  having  a fixed  scale 
parameter . 

3.  Additive  families  of  distributions. 

Again  using  the  same  technique  of  proof  as  in  the  proof  of  Theorem  2.3, 

LPQ  (1977a)  obtain  a result  for  additive  families  of  distributions  which  is 
analogous  to  Theorem  2.3  except  that  "sum"  plays  the  role  of  "minimum". 


Let  F ■ (Fg,  0 £ 0}  be  a family  of  distributions  parameterized  by  0, 
and  let  g be  a binary  operation  on  the  set  of  real  numbers.  The  collection 
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F is  said  to  form  an  additive  family  with  respect  to  g if  for  every 

6^,  9.,  e 0:  X^  and  X2  are  independent  random  variables  with  respective 

distributions  F and  F implies  that  g(X, ,X_)  has  distribution  F.  ..  . 

02  12  0l+e2 

In  Theorem  2.3,  the  family  of  interest  is  the  additive  family  of  exponential 
distributions,  Fg(x)  = 1 - exp(-0x),  with  respect  to  g(x1,x2)  = minCx^^). 

The  following  result  applies  to  additive  families  (FQ,  0 e 0}  with  respect 
to  g(x1>x2)  = x.^  + x2,  where  JxdFg(x)  = 0. 


Theorem  3.1.  For  each  I e I,  let  the  random  variable  T^.  have  distribution 

F e F = {F  , y e M),  where  F is  an  additive  family  of  distributions  with 
Ux  » 

respect  to  g(x^,X2)  = x^  + X2  satisfying  /xdF^(x)  = y.  Then  there  exists 
a collection  {Sj,  J e J}  of  independent  random  variables  such  that  the 

distribution  of  Sj.  belongs  to  F,  J e J,  and  for  each  I e I, 

\ 

TI“  l V 

JcJ: Jnl*0 

Furthermore,  the  mean  X^  of  the  random  variable  is  given  by  (2.1), 

Examples  of  families  satisfying  the  hypothesis  of  Theorem  3.1  are  the 
Poisson  family  with  mean  u and  the  gamma  family  with  mean  y and  unit  scale 
parameter.  A sample  application  in  reliability  follows. 


Example  3.2.  Suppose  that  an  n-component  system  is  exposed  to  shocks  which 
are  not  necessarily  fatal.  For  each  I e I,  a shock  of  type  I simultaneously 
affects  all  components  exclusively  in  subset  I.  For  example,  the  shock  pattern 
for  a two-component  system  might  be  exhibited  as  in  Figure  3.1  below: 


Component  1 


Component  2 


* 


* 

* 


Distinct  shocks 
to  system 


* * 


* 


* 


Time 


Figure  3.1. 

Figure  3.1  indicates  that  a total  of  5 distinct  shocks  occurred  in  the  interval 
[0,  t]:  2 shocks  affected  component  1 alone,  1 shock  affected  component  2 

alone,  and  2 shocks  affected  both  components  simultaneously.  Let  N^(t)  be 
the  number  of  shocks  in  the  interval  [0,  t]  which  are  simultaneously  received 
by  the  components  exclusively  in  subset  I.  In  Figure  3.1,  N^(t)  = 2,  N2(t)  = 1, 
N^2(t)  - 2.  Let  Kj(t)  be  the  number  of  distinct  (in  time)  shocks  in  the 
interval  [0,  t]  which  are  received  by  the  components  in  subset  I.  In  Figure 
3.1,  K^(t)  - 4,  ^(t)  • 3,  K12(t)  = 5.  Note  that,  in  general, 

Mt)  - I N (t), 

1 Jnl*0  J 

so  that  the  processes  (K^(t),  t 2:  0),  I e I,  are  generally  dependent.  If 
(Kj(t),  t £ 0}  is  a Poisson  process  with  intensity  > 0,  I e I,  then  we 
conclude  from  Theorem  3.1  that  there  exists  a collection  {{Nj(t),  t i 0),  J e J} 
of  independent  Poisson  processes  such  that  for  every  I e I, 

K.(t)  8-  l N*(t) . 

JeJ: Jnl*0 


Furthermore,  the  intensity  A of  the  process  (N^t),  t 2:  0}  is  given  by 

(2.1). 

4 . Preserving  system  life  length  and  failure  patterns. 

It  is  desirable  to  have  methods  for  converting  dependent  models  into 
independent  ones  which  preserve  essential  features  of  the  original  (dependent) 
model.  For  example,  in  the  case  of  a series  system  whose  component  life  lengths 
have  exponential  minima.  Theorem  2.3  allows  us  to  convert  a dependent  model 
into  an  independent  one,  while  preserving  the  system  life  length,  i.e.,  the 
minimum  of  the  component  life  lengths.  In  this  section  we  show  how,  under 
more  general  conditions,  it  is  possible  to  convert  a dependent  model  into  an 
independent  one,  while  preserving  features  of  the  original  model  in  addition 
to  system  life  length. 

Consider  an  arbitrary  series  system  of  n components.  In  many  practical 
applications  we  are  able  to  observe: 

(1)  the  time  at  which  the  system  fails,  and 

(2)  the  identity  of  the  component  or  set  of  components  which  fails. 

Note  that  although  we  use  the  language  of  reliability  theory  (series  system, 
component,  etc.),  the  general  model  has  application  in  a variety  of  contexts. 

For  example,  in  population  mortality  studies,  the  data  on  each  subject  includes 
(1)  the  age  at  death  and  (2)  the  cause  of  death.  Suppose  that  an  individual 
dies  due  to  one  of  n possible  causes.  An  individual,  in  this  context,  can 

be  viewed  as  an  n-component  series  system  who  dies  due  to  the  occurrence  of 
one  or  more  of  the  n possible  causes.  As  another  example,  suppose  a personnel 
study  is  undertaken  to  determine  the  departure  patterns  of  employees  in  a large 
company  The  data  on  each  employee  might  consist  of  (1)  the  length  of  stay, 
i.e.,  the  time  from  arrival  to  termination  and  (2)  the  reason  for  termination. 

In  general,  one  could  imagine  any  model  where  observations  include  (1)  the  time 
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at  which  a particular  event  occurs  and  (2)  the  identity  of  the  cause  or  set  of 
causes  (among  a finite  number)  which  results  in  the  occurrence  of  the  event. 
Moreover,  one  or  more  of  the  "causes"  might  be  identified  with  the  withdrawal 
of  a unit  from  observation,  resulting  in  censored  or  truncated  data.  For 
convenience  we  continue  to  employ  the  language  applicable  in  many  other  situations. 

In  this  section  we  show  how  it  is  possible  to  replace  a series  system  of 
dependent  components  by  a series  system  of  independent  components  while 
simultaneously  preserving 

(1) '  the  distribution  of  the  time  to  system  failure,  and 

(2) ’  the  probability  of  occurrence  of  each  failure  pattern. 

By  "failure  pattern"  we  mean,  in  the  case  of  a series  system,  the  failure  of  a 
set  of  components  whose  simultaneous  failure  causes  (i.e.,  coincides  with) 
the  failure  of  the  system. 

We  begin  with  some  terminology  and  notation.  If  T is  the  vector  of 

component  life  lengths  in  an  n-component  system,  we  say  that  failure  pattern  I 

occurs , and  write  5(1)  = I,  if  the  simultaneous  failures  of  the  components 

exclusively  in  subset  I coincide  with  the  failure  of  the  system.  Let  £ and 

T be  the  vectors  of  component  life  lengths  of  two  systems  with  respective  life 

lengths  S and  T.  We  say  that  the  systems  are  equivalent  in  life  lengths  and 
LP 

patterns  (S_  » T_,  in  symbols)  if 

P(S  > t,  C(S)  - I)  - P (T  > t,  £(T)  - I) 

for  every  t £ 0 and  every  I e I . 

Miller  (1977)  proves  the  following  existence  result: 

Theorem  4,1.  Let  be  the  life  length  of  component  i,  i = l,...,n,  and 

let  T be  the  life  length  of  the  corresponding  series  system.  Assume  that  the 


functions 


have  no  common  discontinuities  and  that  P(T^  ■ Tj)  - 0 for  i * j.  Then  there 

LP 

exists  a vector  of  independent  random  variables  such  that  T » j>,  and 

at  least  one  of  the  S.  is  a life  length.  The  distributions  of  S,,...,S 

i in 

are  uniquely  determined  on  {t:  F(t)  > 0},  where  F(t)  **  P(T  > t),  t £ 0. 

We  can  paraphrase  Theorem  4.1  as  follows.  Under  the  given  hypothesis, 

the  original  (dependent)  system  with  life  length  T » minCT^,  1 £ i 5 n)  can 

be  replaced  by  a system  with  life  length  S = minfS^,  1 < i £ n) , where  the  S^'s 

LP 

are  independent  random  variables  in  such  a way  that  S_  = T.  Tsiatis  (1975) 
proves  a similar  result  in  the  context  of  competing  risk  theory  by  assuming 
that  the  joint  distribution  of  T^,...,Tn  has  continuous  partial  derivatives. 
It  is  noteworthy  that  the  nature  of  the  dependence  in  the  original  model  is 
unspecified  in  Theorem  4.1,  i.e.,  the  original  components  might  be  dependent 
in  any  way  whatsoever.  LPQ  (1977b)  show  that  the  assumption  of  no  common 
discontinuities  in  Theorem  4.1  is  a necessary  as  well  as  a sufficient  condition 
for  the  replacement  of  a dependent  model  by  an  independent  one.  Moreover,  we 
provide  explicit  expressions  for  the  appropriate  distributions  in  the 
independent  model. 

Before  presenting  the  LPQ  (1977b)  result,  we  motivate  the  theorem  as 

follows.  Let  T = min(T.,  1 £ i £ n) , where  T, ,...,T  are  the  (dependent) 

l In 

component  life  lengths.  Let  F(t,  I)  be  the  joint  probability  that  the  system 

survives  beyond  time  t and  failure  pattern  I occurs.  For  example,  if  n * 3 

F(t , {1,2})  - P(T  > t,  T^  - T2  < Tj)  so  that  ties  are  possible.  The  problem 

as  posed  in  LPQ  (1977b)  can  be  stated  as  follows.  Given  the  vector  T of 

LP 

(dependent)  life  lengths,  determine  a random  vector  such  that  ■ T,  where 
S^,...,Sn  are  expressible  in  terms  of  independent  random  variables.  By 
"expressible"  we  mean  that  the  are  either  themselves  independent  random 

variables  or  else  can  be  expressed  as  functions  of  independent  random  variables 
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The  solution  is  found  by  letting  S. S be  the  life  lengths  of  components 

l n 

in  a theoretical  n-component  series  system,  where  the  components  are  exposed 
to  shocks  according  to  the  following  shock  model.  Each  component  fails  if 
it  receives  a shock.  Independent  sources  of  shock  are  present  in  the  environment- 
one  source  for  each  I e I.  A shock  from  source  I simultaneously  kills  the 
components  exclusively  in  subset  I.  Let  denote  the  time  (measured  from 

the  origin)  until  a shock  from  source  I occurs.  Then  “ min(Hj,  I e I,  i e I), 

1 £ i £ n,  and  S = H,  where  S • min(Si>  1 £ i £ n)  and  H * min(Hj,  I e I). 

Define 

* [i,  if  H < H.  for  each  J * I 

5<H)-*j  1 J 

^0  otherwise. 

To  allow  for  simultaneous  failures  among  the  components  in  the  original  system, 
we  permit  the  dimension  of  the  vector  H to  be  greater  than  or  equal  o that 
of  T_.  Generally,  if  has  dimension  n,  then  the  vector  H of  times  until 
shock  has  dimension  (at  most)  2n  - 1.  The  subscripts  on  the  components  of  H 
are  understood  to  be  ordered  lexicographically.  It  follows  that 
P(S  > t,  S(S)  = I)  - P (T  > t,  £(T)  « i)  if  and  only  if 

P (H  > t,  C*(H)  - I)  - P (T  > t,  £(T)  - I)  (4.1) 

for  each  t £ 0 and  each  I e 7.  If  (4.1)  holds  for  every  subset  I of 

LP 

{1 n},  we  write  H - T.  The  problem  will  be  solved  if  we  determine 

Independent  random  variables  H^,  I £ I,  such  that  (4.1)  holds  for  every  t £ 0 

and  every  subset  I of  (l,...,n).  LPQ  (1977b)  prove  the  following: 

Theorem  4.2.  Let  T ■ min(T^,  1 £ i £ n)  be  the  life  length  of  an  n-component 
series  system,  where  T^  is  the  life  length  of  component  i,  i - l,...,n.  Define 
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F(t,  I)  = P (T  > t,  £(T)  - I)  and  F(t,  I)  - P(T  < t,  £(T)  ■ I) , I c I.  Let 
F(t)  » P(T  > t)  and  a(F)  ■ sup{x:  F(x)  > 0}.  Then  the  following  statements 
hold : 

(i)  A necessary  and  sufficient  condition  for  the  existence  of  a set  of 

LP 

independent  random  variables  {H^,  I e 1}  which  satisfy  H = ]T,  where 
H “ minfHj,  I e I),  is  that  the  functions  F(*,  I),  I e I,  have  no  common 
discontinuities  in  the  interval  [0,  o(F)). 

(ii)  The  random  variables  {H^.,  I e 1}  in  (i)  have  corresponding  survival 
probabilities  {G^ ( • ) , I e 7}  which  are  uniquely  determined  on  the  interval 
[0,  a(F))  as  follows: 


G:(t) 


exp {-/  [dF  (' 
0 


I)/F]} 


• n [F(a  (I))/F(a  (I)")],  0 < t < a(F),  (4.2) 

aj(I)5t  J -1 


where  F (•,  I)  is  the  continuous  part  of  F(*,  I),  {a^d)}^^  is  the  set 
of  discontinuities  of  F(»,  I),  I e I,  and  the  product  over  an  empty  set  is 
defined  as  unity. 

The  following  generalization  of  Theorem  4.2  for  arbitrary  (not  necessarily 
coherent)  systems  also  holds: 


Theorem  4.3.  Let  T^  denote  the  life  length  of  component  i,  i = l,...,n,  in 
an  arbitrary  n-component  system  with  life  length  T.  Define  F(*,  I)  and 
a(F)  as  in  Theorem  4.2.  Then  (i)  and  (ii)  of  Theorem  4.2  hold. 


Example  4.4.  Suppose  that  the  vector  (T^,  T2)  has  a bivariate  distribution 
with  survival  probability: 

F(t1,  t2)  - (1  + tj  + t2)_1,  t^  0,  t2  £ 0. 

Note  that  T^  and  T2  are  mutually  dependent.  If  T^  and  T2  are  the 
component  life  lengths  in  a two-component  series  system,  we  may  conclude 
from  (4.2)  of  Theorem  4.2  that  the  original  system  is  equivalent  in  life 


length  and  failure  patterns  to  a system  involving  independent  times  and 

H2  until  shock,  where 

G1(t)  - PO^  > t)  - (1  + 2t)"%,  i - 1,2, t s 0. 

5 . Applications  to  the  theory  of  competing  risks,  life  testing,  and  censored 
data  problems. 

The  reader  will  note  that  in  every  model  thus  far  considered,  we  have 
obtained  explicit  expressions  for  the  appropriate  distributions  in  the 
independent  model,  whereas  existence  alone  is  proven  in  other  approaches. 

Thus,  our  results  are  not  only  more  general  but  also  more  readily  applicable, 
especially  when  explicit  solutions  are  called  for.  The  probabilistic  results 
obtained  are,  of  course,  of  interest  in  their  own  right  since  they  facilitate 
the  analysis  of  various  dependent  models.  However,  a significant  statistical 
payoff  is  also  derived  from  our  approach.  In  this  section  we  show  how  our 
probabilistic  solution  to  the  conversion  problem  of  Theorem  4.2  can  be  used  to 
unify  the  nonparametric  approach  to  estimation  problems  in  competing  risk  theory, 
life  testing,  and  certain  incomplete  data  problems. 

The  theory  of  competing  risks  derives  its  name  from  the  fact  that,  during 
a person’s  lifetime,  he  is  exposed  to  several  risks  of  death  (various  fatal 
diseases,  accidents,  etc.)  which  can  be  viewed  as  "competing"  for  his  life. 

A series  system  of  r dependent  components  with  life  length  T - minOT^,  1 £ i £ r) 
such  that  failure  pattern  I occurs  [COT)  * I]  becomes,  in  the  terminology  of 
competing  risks,  an  individual  with  life  length  T « min(T^,  1 <.  i 3 r)  exposed 
to  r dependent  risks  of  death,  where  T^  is  the  age  at  death  if  risk  i 
were  the  only  risk  present  in  the  environment,  1 3 1 £ r,  and  £ is  the  cause 
of  death,  i.e.,  the  subset  I of  {l,...,r}  such  that  T ■ for  each  i c I 

and  T * T^  for  each  i 4 I.  When  death  results  from  a single  cause,  then  £ is 
the  index  i for  which  T ■ T^.  [In  an  Incomplete  or  censored  data  problem. 


one  of  the  random  variables  represents  the  time  at  which  an  individual 

becomes  "unobservable"  for  a reason  other  than  death,  while  the  remaining 
variables  typically  represent  various  causes  of  death.]  The  biomedical 
researcher  is  interested  in  making  inferences  about  unobservable  quantities 
(viz.,  the  random  variables  T^,...,Tr)  by  using  data  from  observable 
quantities  - in  this  case,  the  lifetime  T and  cause  £.  In  showing  how 
Theorem  4.2  may  be  applied,  we  focus  on  the  following  question:  How  can  we 

estimate  the  marginal  survival  probabilities  corresponding  to  a given  risk 
(or  combination  of  risks)  operating  alone  without  competition  from  the  other 
risks?  That  is,  how  can  we  estimate  the  2r  - 1 survival  probabilities  (so- 
called  "net  probabilities")  MT(t)  * P[min(T  , j e J)  > t],  J c {l,...,r}? 

Throughout  the  remainder  of  this  paper  let  (T^,  * • • »Tri) » 1 " l,...,n, 
represent  a random  sample  of  size  n from  the  joint  distribution  of  the 
nonnegative  random  variables  T^f...,T  . To  conform  to  the  usual  notation, 
we  reserve  'n'  for  sample  size.  Thus,  let  I now  denote  the  collection  of 
all  nonempty  subsets  of  the  set  {l,...,r}  of  risks.  We  adopt  all  of  the 
previous  notation  subject  to  the  substitution  of  ’r’  for  'n'.  For  each 
distribution  F,  let  F(=  1 - F)  be  the  corresponding  survival  function.  For 
each  I e I,  let  M^(t)  « P(Tj  £ t) , where  T^  * min(Tj,  i e I) . In  the 
competing  risk  model,  only  the  following  are  observed: 


and 


Tq^  ■ min (Tj^ , • • • , T^)  , 1 ^ i ^ n» 


Clt  1 s i s n. 


where  ■ I if  and  only  T - Ti  for  each  1 c I and  T * T^  for  each  i 4 I, 


Let  0 s T’  s£  T'  5 •••  S T!  T 

00  01  On  max 


denote  the  ordered  values  of  the 
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observations  T^,...,!^. 

Consider  now  the  following  assumptions: 

(Al)  The  risks  (i.e.,  the  random  variables  T^,...,Tr)  are  mutually  independent. 
(A2)  No  ties  are  possible,  i.e.,  Pd^  - T^)  ■ 0 for  each  i * j,  i,J  ■ l,...,r. 
(A3)  The  distributions  of  T^,...,Tr  have  no  common  discontinuities. 

(A4)  The  random  variables  T^ T^  have  a joint  distribution  which  is 

absolutely  continuous. 

Note  that  Assumptions  (A2)  and  (A3)  together  imply  that  death  results 
from  a single  cause. 

Assuming  (Al),  (A2),  and  (A3),  Peterson  (1975)  examines  the  following 
estimator  for  Mj: 


M.(t)  - H [(n  - R)/(n  - R + 1)], 
J R 


(4.3) 


where  the  product  is  over  the  ranks  R of  those  observations  Tq^,  1 S i s n, 

such  that  T^ , S t < T and  Tl.  corresponds  to  a death  from  at  least  one 
Oi  max  Oi  r 

cause  in  subset  J.  If  T - T.,  for  some  j « J and  some  i,  1 - i 5 n,  then 

max  3 x 

A f — 

M,(t)  is  defined  to  be  zero  for  t > T • Otherwise,  M (t)  is  undefined  for 
J max  J 

t > T 

max 

The  estimator  (4.3)  is  a generalization  of  the  well-known  product-limit 
estimator  for  a survival  probability  proposed  by  Kaplan  and  Meier  (1958) . If 
TQi  ■ Tj^  for  each  i,  1 £ i s n,  and  some  fixed  j,  1 S j Sr,  then  (4.3) 
reduces  to  a step  function  with  jumps  of  height  1/n  at  each  Tq^,  thus  yielding 
the  usual  empirical  estimate  of  Mj (t) . Assuming  (Al),  (A2),  and  (A3), 

Peterson  (1975)  shows  that  the  estimator  (4.3)  is  maximum  likelihood,  (weakly) 
consistent,  and,  regarded  as  a process  in  t,  converges  to  a normal  process. 

In  the  remainder  of  this  section  we  drop  the  assumption  (Al)  of  independent 
risks.  How  then  can  we  estimate  the  functions  Hj(t),  I c I?  Note  that 
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formula  (4.2)  of  Theorem  4.2  exhibits  a relationship  between  distributions 
of  observable  quantities  [viz.,  the  survival  probability  F(t)  and  the 
functions  F(t,  i),  1 £ i £ r]  and  distributions  associated  with  the  theoretical 
random  variables  H , I e I,  which  are  unobservable.  Replacing  F(t)  and 
F(t,  i)  in  (4.2)  by  their  empirical  counterparts  thus  allows  us  to  estimate 
the  distributions  G.,  1 £ i £ r,  associated  with  the  unobservable  random 
variables  IK,  1 £ i £ r.  Unfortunately,  the  distributions  G^,  1 £ i 2 r, 
are,  in  general,  different  from  the  marginal  distributions  1 £ i £ r, 
which  we  seek  to  estimate.  The  natural  question  then  is  how  to  relate  the 
unobservable  (but  estimable)  functions  G^,  1 £ i £ r,  to  the  marginal  distri- 
butions M^,  1 £ i £ r.  More  generally,  how  can  we  relate  the  functions 
to  the  survival  probabilities  G^  given  by  (4.2)7 

Assuming  no  ties  (A2)  and  no  common  discontinuities  among  the  marginal 
distributions  (A3),  Peterson  (1975)  finds  necessary  and  sufficient  conditions 
for  a relationship  to  exist  between  the  functions  Gj  and  M^..  Dropping 
the  assumption  (A2)  of  no  ties  and  weakening  the  assumption  (A3)  of  no  common 
discontinuities,  LPQ  (1977c)  prove  the  following: 

Theorem  4.4.  Assume  that  the  functions  F(*,  I),  I e I,  in  Theorem  4.2  have 
no  common  discontinuities.  Let  I £ I.  Then  for  each  t e [0,  a(F)), 


M,(t)  - n G_(t)  (4.4) 

1 jnl *t  J 

4 

if  and  only  if  the  following  two  conditions  hold: 


and 


MI(a)/NI(a~) 


* 

F(a)/F(a~) , a £ D(F(*,  Ij)) 
1 , otherwise, 


(4.5a) 


P(T?  2 t|TT  - t)  - P(Tj  > t|TT  > t), 


(4.5b) 


where  Gj(t)  is  given  by  (4.2)  and  D(F(*,  I ))  is  the  set  of  discontinuities 
of  the  function  F(t,  1^)  - P(T  > t,  £(T)  e J,  J n I * 0). 

Desu  and  Narula  (1977)  arrive  at  a condition  similar  to  (4.5b)  when  the 
assumption  of  absolute  continuity  (A4)  and  hence  also  the  assumption  of  no  ties 
(A2)  hold.  We  remark  that  the  assumption  of  no  ties  (A2)  does  not  hold, 
e.g.,  in  models  where  failures  or  deaths  from  simultaneous  causes  can  occur. 

An  important  family  of  multivariate  distributions  for  which  assumption  (A2) 
fails  is  the  family  of  multivariate  exponential  distributions  of  Marshall  and 
Olkin  (1967).  We  illustrate  with  an  example. 

Example  4.5.  For  simplicity,  suppose  that  the  random  vector  (T^,  T2)  has 
the  Marshall-Olkin  bivariate  exponential  distribution  with  survival  probability 

p(Ti  > tp  T2  > t2)  « exp[-X1t1  - X2t2  - *12  max(t^,  t2)], 

for  t^,  t2  £ 0 and  X^,  X2>  X^2  > 0.  Since  the  marginal  distributions 
and  M2  are  continuous,  condition  (4.5a)  of  Theorem  4.4  holds  trivally. 
Condition  (4.5b)  with  I - {1}  states  that 

P(T2  2:  1 1 T1  - t)  - P(T2  > 1 1 T^  > t). 

An  easy  computation  shows  that  P(T2  ^ ■ t)  “ P(T2  > t | > t)  - exp(-X2t). 

Thus,  Theorem  4.4  may  be  applied  when  the  joint  distribution  belongs  to  the 
family  of  Marshall-Olkin  MVE  distributions,  whereas  other  approaches  to  the 
estimation  problem  do  not  apply  here  since  the  assumption  of  no  ties  (A2) 
fails  to  hold. 

Formula  (4.2),  via  Theorem  4.4,  can  now  be  used  to  suggest  estimators 
for  the  functions  M^.,  I e I,  in  the  important  practical  cases  when 
independence  falls  to  hold  and  when  ties  are  allowed.  Suppose  that  the  joint 
distribution  of  T^,...,Tr  satisfies  (4.5  a,b).  For  each  i ■ l,...,n,  only 
the  following  are  observed  : 
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Tq^  — min(T^, ...  »T^)  * 


where  ^ = J if  and  only  if  = Tji  for  eactl  j e J and  * T for 

each  j i J.  In  accordance  with  Theorem  4.4,  we  estimate  MT(t)  for  t 5 T by 

I max 

iL(t)  = n GT(t), 

1 Jnl*0 

where  is  the  function  resulting  from  (4.2)  by  replacing  F and  F(»,  I) 

by  their  empirical  counterparts,  I € I.  In  analogy  with  (4.3),  the  resulting 
statistic  can  be  expressed  as  follows: 


M (t)  - n [(n  - R)/(n  - R + 1)], 
R 


(4.6) 


where  the  product  is  over  the  ranks  R of  those  observations  T'  , 1 ^ R < n, 

U » K 

such  that  T'  £ t :£  T and  T'  D corresponds  to  a death  from  the 
0,R  max  0,R 

simultaneous  causes  1 e J such  that  J n I * 0.  If  for  some  i,  T = T.. 
max  j i 

for  each  j e J with  J n I * 0,  then  M^(t)  is  defined  to  be  zero  for  t > T ^ 

Otherwise,  Mr(t)  is  undefined  for  t > T 

I max 

Optimality  properties  of  the  estimator  (4.6)  readily  suggest  themselves 
because  of  its  resemblance  to  the  (generalized)  Kaplan-Meier  estimator  (4.3) 
and  its  reliance  upon  empirical  distributions.  Moreover,  there  is  evidence 
that  formula  (4.2)  can  be  used  in  a similar  way  to  estimate  other  quantities 
of  interest  to  the  biomedical  researcher. 


* 
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